Better null models for assessing predictive accuracy of disease models

您所在的位置:网站首页 length null Better null models for assessing predictive accuracy of disease models

Better null models for assessing predictive accuracy of disease models

#Better null models for assessing predictive accuracy of disease models | 来源: 网络整理| 查看: 265

Abstract

Null models provide a critical baseline for the evaluation of predictive disease models. Many studies consider only the grand mean null model (i.e. R2) when evaluating the predictive ability of a model, which is insufficient to convey the predictive power of a model. We evaluated ten null models for human cases of West Nile virus (WNV), a zoonotic mosquito-borne disease introduced to the United States in 1999. The Negative Binomial, Historical (i.e. using previous cases to predict future cases) and Always Absent null models were the strongest overall, and the majority of null models significantly outperformed the grand mean. The length of the training timeseries increased the performance of most null models in US counties where WNV cases were frequent, but improvements were similar for most null models, so relative scores remained unchanged. We argue that a combination of null models is needed to evaluate the forecasting performance of predictive models for infectious diseases and the grand mean is the lowest bar.

Citation: Keyel AC, Kilpatrick AM (2023) Better null models for assessing predictive accuracy of disease models. PLoS ONE 18(5): e0285215. https://doi.org/10.1371/journal.pone.0285215

Editor: Ali R. Ansari, Gulf University for Science and Technology, KUWAIT

Received: December 2, 2022; Accepted: April 17, 2023; Published: May 5, 2023

Copyright: © 2023 Keyel, Kilpatrick. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The human neuroinvasive West Nile virus case data set used here is maintained by the Centers for Disease Control Division of Vector-borne Disease ([email protected]) and is available to qualified researchers, subject to a data use agreement. County population data were obtained from the US Census Bureau and are available in a ready-to-use format in the census.data object from www.github.com/akeyel/wnvdata.

Funding: This publication was supported by cooperative agreement 1U01CK000509-01, funded by the Centers for Disease Control and Prevention and by the National Institutes of Health grant R01AI168097 [ACK] and National Science Foundation grants DEB 1911853, DEB-1717498 and CNH-1115069 [AMK]. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention or the Department of Health and Human Services. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Forecasting infectious disease dynamics is a key challenge for the 21st century [1]. Climate and land use change, combined with the introduction of pathogens to new regions, has created an urgent need for predicting future disease threats [2]. Large data sets and new modeling and statistical techniques have opened up possibilities for ecological forecasting [3]. A key step in the evaluation of predictive models is assessing their improvement over null models. The use of null models to provide a baseline in the absence of specific mechanisms has a long history in ecology [4]. Such baselines are important, as in some cases, predictive models may appear to be informative, but may be no better than a simple and uninformative null model [5, 6]. For example, when dealing with rare events, if a predictive model is outperformed by a null model that predicts the event to never occur, it is not providing much useful information about the process being studied [5].

West Nile Virus (WNV) is an excellent system in which to examine null models in a probabilistic context. WNV is a flavivirus that cycles between mosquito and avian populations [7–9]. WNV was introduced to the United States (US) in 1999 [10] and rapidly spread to the conterminous US and throughout the Americas [11]. As a nationally notifiable disease in the US, long-term data sets (>20 years) exist on human cases [12]. Many models have been built for predicting WNV risk [13] including mechanistic models based on climate and vector data sets [e.g., 14, 15]. Most studies of WNV, and many other pathogens, have included only a very simplistic null model (e.g. R2, which uses the grand mean of the training data) for assessment of model accuracy.

Our aim was to examine a range of null models (Table 1) to provide guidance on null model selection and performance in disease forecasting for locations with frequent (≥50% of years with disease) and infrequent cases (disease present, but 50% of time series) or infrequent (present < 50% of the time series).

https://doi.org/10.1371/journal.pone.0285215.t003

The length of the training time series had only weak effects on null model performance (Fig 3). For frequent WNV counties, model score improved significantly with the length of the training time series for four of the six models examined, but the effect was similar for all four models (Table 4). For the two remaining models, increasing the length of the training time series had a non-significant improvement in model score (Pooled Mean Value) or actually made the score worse (Uniform) (Table 5). For infrequent WNV counties mean CRPS null model score did not improve significantly with the length of the training time series for any model, but got significantly worse for the Uniform Null (Table 5). Thus, except for the Uniform Null, the relative rankings of null models were the same across the full range of time series lengths examined (5 to 17 years; Fig 3).

Download: PPTPowerPoint slidePNGlarger imageTIFForiginal imageFig 3. The Negative Binomial and Historical were generally the top two models (lower CRPS scores correspond to a more accurate model), independent of length of time series used to train the models for both a) the counties with frequent WNV and b) the counties with infrequent WNV cases.

Training years were randomly selected from the entire time series, and a random focal year was selected for evaluation. Only a subset of null models was evaluated over time. Shading indicates a 95% confidence interval for the estimated mean. AA: Always Absent, HN: Historical Null, MV: Mean Value, NB: Negative Binomial, PV: Pooled Value, UN: Uniform.

https://doi.org/10.1371/journal.pone.0285215.g003

Download: PPTPowerPoint slidePNGlarger imageTIFForiginal imageTable 4. Analysis of the length of the training time series on the mean CRPS score for six null models for frequent WNV counties (Fig 3A).

A model with an interaction between null model and time series length had more support than an additive model (ΔAIC = 21, see S1 Table in S1 File for detailed parameter estimates). The table shows the statistics for the slopes for each model (not differences between slopes).

https://doi.org/10.1371/journal.pone.0285215.t004

Download: PPTPowerPoint slidePNGlarger imageTIFForiginal imageTable 5. Analysis of the length of the training time series on the mean CRPS score for six null models for infrequent WNV counties (Fig 3B).

A model with an interaction between null model and time series length had more support than an additive model (ΔAIC = 50, see S2 Table in S1 File for detailed parameter estimates). The table shows the statistics for the slopes for each model (not differences between slopes).

https://doi.org/10.1371/journal.pone.0285215.t005

Discussion

At least five null models significantly outperformed a county-based grand mean and many did far better (Figs 2, 3). A grand mean calculated across all included counties (Pooled Mean model) performed even worse. Thus, when evaluating the performance of new statistical or mechanistic models of disease incidence, there are far better null models than the grand mean (i.e. R2). These null models can be easily calculated for time-series data (e.g., using the probnulls package from GitHub in R), and our results suggest that the length of time series was not critical for developing a robust null model across a range of 4–16 years. The Negative Binomial and Historical nulls were the strongest null models overall (Fig 2), with the Always Absent null performing well where disease cases were infrequent. The strong performance of the Always Absent null in regions where WNV was infrequent (statistically tied with Negative Binomial, Fig 2; top model in 8 of 18 years, Table 3) is a reminder that basic accuracy statistics for rare events can appear high.

The structure and scale of the underlying data may affect the performance of the different null models. The WNV data set here does not have a clear temporal trend. A strong temporal trend would likely have changed which model performed the best. Specifically, null models that use the recent past to predict future cases (e.g. autoregressive models) would perform much better. Seasonal patterns, as examined in recent dengue forecasts [1], could also affect which null model performs best. Future work could explore the performance of different models under different magnitudes of temporal trend and stochastic variation. Many (34%) of counties in the US did not have a neuroinvasive case within the study period. For risk estimates for these counties, fitting models on groups of counties may be necessary [e.g., as in 28]. Additionally, county-annual scales may be more relevant to academic study than to vector control and public health responses [29]. Research on null model performance is needed at finer spatial and temporal scales.

Broadly, null models are seeing increased use in the infectious disease modeling literature. A uniform model and a SARIMA model were used to predict dengue cases as part of a forecasting challenge in Puerto Rico [1]. A random walk and a probabilistic prior-week model were used as null models for forecasting COVID-19 deaths [30], and a modification of a simple AR(1) model was found to perform well for predicting COVID-19 hospitalizations [31, 32].

Conclusion

We strongly recommend the inclusion of multiple null models when testing predictive models of vector-borne diseases. A grand mean calculated from the training data set is an inadequate null model given the suite of probabilistic alternatives available. The Negative Binomial and Historical nulls performed especially well for WNV and simple autoregressive models performed moderately well and would likely perform even better for data with temporal trends. Negative Binomial and Historical null models performed well both when WNV cases were frequent and when they were infrequent, and their relative performance did not depend on the length of the training time series. Researchers proposing mechanistic models should determine if their models are an improvement over a simple statistical description of historical patterns.

Supporting informationS1 File. Two tables containing full parameter details for the time series length analysis for counties with frequent (S1 Table) and infrequent (S2 Table) WNV cases.

https://doi.org/10.1371/journal.pone.0285215.s001

(DOCX)

Acknowledgments

We thank L. F. Chaves for constructive discussion.

References1. Johansson MA, Apfeldorf KM, Dobson S, Devita J, Buczak AL, Baugher B, et al. An open challenge to advance probabilistic forecasting for dengue epidemics. Proceedings of the National Academy of Sciences. 2019;116: 24268–24274. pmid:31712420 View Article PubMed/NCBI Google Scholar 2. Kilpatrick AM, Randolph SE. Drivers, dynamics, and control of emerging vector-borne zoonotic diseases. LANCET. 2012;380: 1946–1955. pmid:23200503 View Article PubMed/NCBI Google Scholar 3. Dietze M. Ecological Forecasting. Princeton University Press; 2017. 4. Gotelli NJ, Graves GR. Null models in ecology. 1996. View Article Google Scholar 5. Olden JD, Jackson DA, Peres-Neto PR. Predictive Models of Fish Species Distributions: A Note on Proper Validation and Chance Predictions. null. 2002;131: 329–336. View Article Google Scholar 6. Beale CM, Lennon JJ, Gimona A. Opening the climate envelope reveals no macroscale associations with climate in European birds. Proceedings of the National Academy of Sciences. 2008;105: 14908–14912. pmid:18815364 View Article PubMed/NCBI Google Scholar 7. Work TH, Hurlbut HS, Taylor R. Indigenous Wild Birds of the Nile Delta as Potential West Nile Virus Circulating Reservoirs. The American Journal of Tropical Medicine and Hygiene. 1955;4: 872–888. pmid:13259011 View Article PubMed/NCBI Google Scholar 8. Komar N, Langevin S, Hinten S, Nemeth N, Edwards E, Hettler D, et al. Experimental infection of North American birds with the New York 1999 strain of West Nile virus. Emerging infectious diseases. 2003;9: 311. pmid:12643825 View Article PubMed/NCBI Google Scholar 9. Kilpatrick AM. Globalization, land use, and the invasion of West Nile virus. Science. 2011;334: 323–327. pmid:22021850 View Article PubMed/NCBI Google Scholar 10. Lanciotti RS, Roehrig JT, Deubel V, Smith J, Parker M, Steele K, et al. Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science. 1999;286: 2333–2337. pmid:10600742 View Article PubMed/NCBI Google Scholar 11. Kramer LD, Ciota AT, Kilpatrick AM. Introduction, Spread, and Establishment of West Nile Virus in the Americas. Journal of Medical Entomology. 2019;56: 1448–1455. pmid:31549719 View Article PubMed/NCBI Google Scholar 12. CDC. Nationally notifiable arboviral diseases reported to ArboNET: Data release guidelines. Centers for Disease Control and Prevention; 2019. 13. Barker CM. Models and Surveillance Systems to Detect and Predict West Nile Virus Outbreaks. J Med Entomol. 2019;56: 1508–1515. pmid:31549727 View Article PubMed/NCBI Google Scholar 14. Davis JK, Vincent GP, Hildreth MB, Kightlinger L, Carlson C, Wimberly MC. Improving the prediction of arbovirus outbreaks: A comparison of climate-driven models for West Nile virus in an endemic region of the United States. Acta Trop. 2018;185: 242–250. pmid:29727611 View Article PubMed/NCBI Google Scholar 15. DeFelice NB, Schneider ZD, Little E, Barker C, Caillouet KA, Campbell SR, et al. Use of temperature to improve West Nile virus forecasts. PLOS Comput Biol. 2018;14. pmid:29522514 View Article PubMed/NCBI Google Scholar 16. Smith KH, Tyre AJ, Hamik J, Hayes MJ, Zhou Y, Dai L. Using Climate to Explain and Predict West Nile Virus Risk in Nebraska. GeoHealth. 2020;4: e2020GH000244. pmid:32885112 View Article PubMed/NCBI Google Scholar 17. Venables WN, Ripley BD. Modern Applied Statistics with S. Fourth. New York: Springer; 2002. Available: http://www.stats.ox.ac.uk/pub/MASS4/ 18. Ripley BD. Time series in R 1.5.0. R News. 2002;2: 2–7. View Article Google Scholar 19. US Census Bureau. Intercensal estimates of the resident population for counties and states: April 1, 2000 to July 1, 2010. Suitland, MD: US Census Bureau. Retreived from: https://www.census.gov/data/datasets/time-series/demo/popest/intercensal-2000-2010-counties.html. 2017. 20. US Census Bureau. Population, Population Change, and Estimated Components of Population Change: April 1, 2010 to July 1, 2019 (CO-EST2019-alldata). Suitland, MD: US Census Bureau. Retreived from: https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html. 2019. 21. Jordan A, Krüger F, Lerch S. Evaluating Probabilistic Forecasts with scoringRules. Journal of Statistical Software. 2019;90: 1–37. View Article Google Scholar 22. Bracher J, Ray EL, Gneiting T, Reich NG. Evaluating epidemic forecasts in an interval format. PLOS Computational Biology. 2021;17: e1008618. pmid:33577550 View Article PubMed/NCBI Google Scholar 23. Matheson JE, Winkler RL. Scoring rules for continuous probability distributions. Management Science. 1976;22: 1087–1096. View Article Google Scholar 24. Hersbach H. Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting. 2000;15: 559–570. View Article Google Scholar 25. Wilks DS. Statistical Methods in the Atmospheric Sciences. Academic Press; 2011. 26. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2017. Available: https://www.R-project.org/ 27. Holm S. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics. 1979;6: 65–70. View Article Google Scholar 28. Keyel AC. Patterns of West Nile virus in the Northeastern United States using negative binomial and mechanistic trait-based models. medRxiv. 2022; 2022.11.09.22282143. View Article Google Scholar 29. Keyel AC, Gorris ME, Rochlin I, Uelmen JA, Chaves LF, Hamer GL, et al. A proposed framework for the development and qualitative evaluation of West Nile virus models and their application to local public health decision-making. PLOS Neglected Tropical Diseases. 2021;15: e0009653. pmid:34499656 View Article PubMed/NCBI Google Scholar 30. Cramer EY, Ray EL, Lopez VK, Bracher J, Brennen A, Castro Rivadeneira AJ, et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proceedings of the National Academy of Sciences. 2022;119: e2113561119. pmid:35394862 View Article PubMed/NCBI Google Scholar 31. Olshen AB, Garcia A, Kapphahn KI, Weng Y, Vargo J, Pugliese JA, et al. COVIDNearTerm: A simple method to forecast COVID-19 hospitalizations. Journal of Clinical and Translational Science. 2022/04/19 ed. 2022;6: e59. pmid:35720970 View Article PubMed/NCBI Google Scholar 32. White LA, McCorvie R, Crow D, Jain S, León T M. Assessing the accuracy of California county level COVID-19 hospitalization forecasts to inform public policy decision making. medRxiv. 2022; 2022.11.08.22282086. View Article Google Scholar


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3